Overview

Dataset statistics

Number of variables17
Number of observations1000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory132.9 KiB
Average record size in memory136.1 B

Variable types

CAT10
NUM7

Reproduction

Analysis started2020-08-10 12:30:16.955290
Analysis finished2020-08-10 12:30:29.832120
Duration12.88 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

gross margin percentage has constant value "4.7619047619999995" Constant
Date has a high cardinality: 89 distinct values High cardinality
Time has a high cardinality: 506 distinct values High cardinality
Total is highly correlated with Tax 5% and 2 other fieldsHigh correlation
Tax 5% is highly correlated with Total and 2 other fieldsHigh correlation
cogs is highly correlated with Tax 5% and 2 other fieldsHigh correlation
gross income is highly correlated with Tax 5% and 2 other fieldsHigh correlation
City is highly correlated with BranchHigh correlation
Branch is highly correlated with CityHigh correlation
Time is uniformly distributed Uniform
Invoice ID has unique values Unique

Variables

Invoice ID
Categorical

UNIQUE

Distinct count1000
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size7.8 KiB
166-19-2553
 
1
244-08-0162
 
1
468-99-7231
 
1
712-39-0363
 
1
575-30-8091
 
1
Other values (995)
995
ValueCountFrequency (%) 
166-19-255310.1%
 
244-08-016210.1%
 
468-99-723110.1%
 
712-39-036310.1%
 
575-30-809110.1%
 
232-16-248310.1%
 
573-58-973410.1%
 
529-56-397410.1%
 
401-18-801610.1%
 
585-11-674810.1%
 
Other values (990)99099.0%
 
2020-08-10T14:30:29.945545image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length11
Median length11
Mean length11
Min length11

Branch
Categorical

HIGH CORRELATION

Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.8 KiB
A
340
B
332
C
328
ValueCountFrequency (%) 
A34034.0%
 
B33233.2%
 
C32832.8%
 
2020-08-10T14:30:30.061026image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

City
Categorical

HIGH CORRELATION

Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.8 KiB
Yangon
340
Mandalay
332
Naypyitaw
328
ValueCountFrequency (%) 
Yangon34034.0%
 
Mandalay33233.2%
 
Naypyitaw32832.8%
 
2020-08-10T14:30:30.180020image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length9
Median length8
Mean length7.648
Min length6

Customer type
Categorical

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.8 KiB
Member
501
Normal
499
ValueCountFrequency (%) 
Member50150.1%
 
Normal49949.9%
 
2020-08-10T14:30:30.307505image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length6
Median length6
Mean length6
Min length6

Gender
Categorical

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.8 KiB
Female
501
Male
499
ValueCountFrequency (%) 
Female50150.1%
 
Male49949.9%
 
2020-08-10T14:30:30.431578image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length6
Median length6
Mean length5.002
Min length4

Product line
Categorical

Distinct count6
Unique (%)0.6%
Missing0
Missing (%)0.0%
Memory size7.8 KiB
Fashion accessories
178
Food and beverages
174
Electronic accessories
170
Sports and travel
166
Home and lifestyle
160
ValueCountFrequency (%) 
Fashion accessories17817.8%
 
Food and beverages17417.4%
 
Electronic accessories17017.0%
 
Sports and travel16616.6%
 
Home and lifestyle16016.0%
 
Health and beauty15215.2%
 
2020-08-10T14:30:30.662261image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length22
Median length18
Mean length18.54
Min length17

Unit price
Real number (ℝ≥0)

Distinct count943
Unique (%)94.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean55.67213
Minimum10.08
Maximum99.96
Zeros0
Zeros (%)0.0%
Memory size7.8 KiB
2020-08-10T14:30:30.770417image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum10.08
5-th percentile15.279
Q132.875
median55.23
Q377.935
95-th percentile97.222
Maximum99.96
Range89.88
Interquartile range (IQR)45.06

Descriptive statistics

Standard deviation26.49462835
Coefficient of variation (CV)0.4759047004
Kurtosis-1.218591428
Mean55.67213
Median Absolute Deviation (MAD)22.505
Skewness0.007077447853
Sum55672.13
Variance701.9653313
2020-08-10T14:30:30.870282image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
83.7730.3%
 
64.0820.2%
 
32.3220.2%
 
21.5820.2%
 
45.3820.2%
 
48.520.2%
 
26.2620.2%
 
21.1220.2%
 
39.7520.2%
 
24.7420.2%
 
Other values (933)97997.9%
 
ValueCountFrequency (%) 
10.0810.1%
 
10.1310.1%
 
10.1610.1%
 
10.1710.1%
 
10.1810.1%
 
ValueCountFrequency (%) 
99.9620.2%
 
99.9210.1%
 
99.8910.1%
 
99.8310.1%
 
99.8220.2%
 

Quantity
Real number (ℝ≥0)

Distinct count10
Unique (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.51
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size7.8 KiB
2020-08-10T14:30:30.966171image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median5
Q38
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)5

Descriptive statistics

Standard deviation2.923430595
Coefficient of variation (CV)0.5305681661
Kurtosis-1.215547226
Mean5.51
Median Absolute Deviation (MAD)2
Skewness0.01294104802
Sum5510
Variance8.546446446
2020-08-10T14:30:31.062731image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1011911.9%
 
111211.2%
 
410910.9%
 
710210.2%
 
510210.2%
 
6989.8%
 
9929.2%
 
2919.1%
 
3909.0%
 
8858.5%
 
ValueCountFrequency (%) 
111211.2%
 
2919.1%
 
3909.0%
 
410910.9%
 
510210.2%
 
ValueCountFrequency (%) 
1011911.9%
 
9929.2%
 
8858.5%
 
710210.2%
 
6989.8%
 

Tax 5%
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count990
Unique (%)99.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.379368999999999
Minimum0.5085
Maximum49.65
Zeros0
Zeros (%)0.0%
Memory size7.8 KiB
2020-08-10T14:30:31.162066image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0.5085
5-th percentile1.955725
Q15.924875
median12.088
Q322.44525
95-th percentile39.1665
Maximum49.65
Range49.1415
Interquartile range (IQR)16.520375

Descriptive statistics

Standard deviation11.70882548
Coefficient of variation (CV)0.7613332823
Kurtosis-0.0818847579
Mean15.379369
Median Absolute Deviation (MAD)7.50875
Skewness0.892569805
Sum15379.369
Variance137.0965941
2020-08-10T14:30:31.250684image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
22.42820.2%
 
8.37720.2%
 
13.18820.2%
 
4.46420.2%
 
9.004520.2%
 
12.5720.2%
 
10.32620.2%
 
4.15420.2%
 
39.4820.2%
 
10.363520.2%
 
Other values (980)98098.0%
 
ValueCountFrequency (%) 
0.508510.1%
 
0.604510.1%
 
0.62710.1%
 
0.63910.1%
 
0.69910.1%
 
ValueCountFrequency (%) 
49.6510.1%
 
49.4910.1%
 
49.2610.1%
 
48.7510.1%
 
48.6910.1%
 

Total
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count990
Unique (%)99.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean322.96674899999994
Minimum10.6785
Maximum1042.65
Zeros0
Zeros (%)0.0%
Memory size7.8 KiB
2020-08-10T14:30:31.345007image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum10.6785
5-th percentile41.070225
Q1124.422375
median253.848
Q3471.35025
95-th percentile822.4965
Maximum1042.65
Range1031.9715
Interquartile range (IQR)346.927875

Descriptive statistics

Standard deviation245.8853351
Coefficient of variation (CV)0.7613332823
Kurtosis-0.0818847579
Mean322.966749
Median Absolute Deviation (MAD)157.68375
Skewness0.892569805
Sum322966.749
Variance60459.59802
2020-08-10T14:30:31.433389image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
175.91720.2%
 
829.0820.2%
 
189.094520.2%
 
470.98820.2%
 
93.74420.2%
 
216.84620.2%
 
276.94820.2%
 
87.23420.2%
 
217.633520.2%
 
263.9720.2%
 
Other values (980)98098.0%
 
ValueCountFrequency (%) 
10.678510.1%
 
12.694510.1%
 
13.16710.1%
 
13.41910.1%
 
14.67910.1%
 
ValueCountFrequency (%) 
1042.6510.1%
 
1039.2910.1%
 
1034.4610.1%
 
1023.7510.1%
 
1022.4910.1%
 

Date
Categorical

HIGH CARDINALITY

Distinct count89
Unique (%)8.9%
Missing0
Missing (%)0.0%
Memory size7.8 KiB
2/7/2019
 
20
2/15/2019
 
19
3/2/2019
 
18
3/14/2019
 
18
1/8/2019
 
18
Other values (84)
907
ValueCountFrequency (%) 
2/7/2019202.0%
 
2/15/2019191.9%
 
3/2/2019181.8%
 
3/14/2019181.8%
 
1/8/2019181.8%
 
1/25/2019171.7%
 
1/26/2019171.7%
 
1/23/2019171.7%
 
3/5/2019171.7%
 
3/9/2019161.6%
 
Other values (79)82382.3%
 
2020-08-10T14:30:31.556120image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length9
Median length9
Mean length8.677
Min length8

Time
Categorical

HIGH CARDINALITY
UNIFORM

Distinct count506
Unique (%)50.6%
Missing0
Missing (%)0.0%
Memory size7.8 KiB
19:48
 
7
14:42
 
7
17:38
 
6
11:51
 
5
17:16
 
5
Other values (501)
970
ValueCountFrequency (%) 
19:4870.7%
 
14:4270.7%
 
17:3860.6%
 
11:5150.5%
 
17:1650.5%
 
19:4450.5%
 
13:4850.5%
 
17:3650.5%
 
13:5850.5%
 
19:3050.5%
 
Other values (496)94594.5%
 
2020-08-10T14:30:31.678165image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length5
Median length5
Mean length5
Min length5

Payment
Categorical

Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.8 KiB
Ewallet
345
Cash
344
Credit card
311
ValueCountFrequency (%) 
Ewallet34534.5%
 
Cash34434.4%
 
Credit card31131.1%
 
2020-08-10T14:30:31.804070image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length11
Median length7
Mean length7.212
Min length4

cogs
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count990
Unique (%)99.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean307.58738
Minimum10.17
Maximum993.0
Zeros0
Zeros (%)0.0%
Memory size7.8 KiB
2020-08-10T14:30:31.927888image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum10.17
5-th percentile39.1145
Q1118.4975
median241.76
Q3448.905
95-th percentile783.33
Maximum993
Range982.83
Interquartile range (IQR)330.4075

Descriptive statistics

Standard deviation234.1765096
Coefficient of variation (CV)0.7613332823
Kurtosis-0.0818847579
Mean307.58738
Median Absolute Deviation (MAD)150.175
Skewness0.892569805
Sum307587.38
Variance54838.63766
2020-08-10T14:30:32.033721image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
263.7620.2%
 
448.5620.2%
 
207.2720.2%
 
180.0920.2%
 
206.5220.2%
 
83.0820.2%
 
167.5420.2%
 
251.420.2%
 
89.2820.2%
 
789.620.2%
 
Other values (980)98098.0%
 
ValueCountFrequency (%) 
10.1710.1%
 
12.0910.1%
 
12.5410.1%
 
12.7810.1%
 
13.9810.1%
 
ValueCountFrequency (%) 
99310.1%
 
989.810.1%
 
985.210.1%
 
97510.1%
 
973.810.1%
 

gross margin percentage
Categorical

CONSTANT
REJECTED

Distinct count1
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size7.8 KiB
4.761904762
1000
ValueCountFrequency (%) 
4.7619047621000100.0%
 
2020-08-10T14:30:32.159948image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Length

Max length18
Median length18
Mean length18
Min length18

gross income
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count990
Unique (%)99.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.379368999999999
Minimum0.5085
Maximum49.65
Zeros0
Zeros (%)0.0%
Memory size7.8 KiB
2020-08-10T14:30:32.282065image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0.5085
5-th percentile1.955725
Q15.924875
median12.088
Q322.44525
95-th percentile39.1665
Maximum49.65
Range49.1415
Interquartile range (IQR)16.520375

Descriptive statistics

Standard deviation11.70882548
Coefficient of variation (CV)0.7613332823
Kurtosis-0.0818847579
Mean15.379369
Median Absolute Deviation (MAD)7.50875
Skewness0.892569805
Sum15379.369
Variance137.0965941
2020-08-10T14:30:32.379782image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
22.42820.2%
 
8.37720.2%
 
13.18820.2%
 
4.46420.2%
 
9.004520.2%
 
12.5720.2%
 
10.32620.2%
 
4.15420.2%
 
39.4820.2%
 
10.363520.2%
 
Other values (980)98098.0%
 
ValueCountFrequency (%) 
0.508510.1%
 
0.604510.1%
 
0.62710.1%
 
0.63910.1%
 
0.69910.1%
 
ValueCountFrequency (%) 
49.6510.1%
 
49.4910.1%
 
49.2610.1%
 
48.7510.1%
 
48.6910.1%
 

Rating
Real number (ℝ≥0)

Distinct count61
Unique (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.9727
Minimum4.0
Maximum10.0
Zeros0
Zeros (%)0.0%
Memory size7.8 KiB
2020-08-10T14:30:32.482516image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum4
5-th percentile4.295
Q15.5
median7
Q38.5
95-th percentile9.7
Maximum10
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.718580294
Coefficient of variation (CV)0.2464727142
Kurtosis-1.151586839
Mean6.9727
Median Absolute Deviation (MAD)1.5
Skewness0.009009648766
Sum6972.7
Variance2.953518228
2020-08-10T14:30:32.582142image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
6262.6%
 
6.6242.4%
 
9.5222.2%
 
4.2222.2%
 
8212.1%
 
6.2212.1%
 
6.5212.1%
 
5212.1%
 
5.1212.1%
 
7202.0%
 
Other values (51)78178.1%
 
ValueCountFrequency (%) 
4111.1%
 
4.1171.7%
 
4.2222.2%
 
4.3181.8%
 
4.4171.7%
 
ValueCountFrequency (%) 
1050.5%
 
9.9161.6%
 
9.8191.9%
 
9.7141.4%
 
9.6171.7%
 

Interactions

2020-08-10T14:30:23.123544image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:23.287511image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:23.421889image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:23.554196image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:23.683553image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:23.811493image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:23.936342image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:24.061127image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:24.178175image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:24.294401image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:24.411436image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:24.529929image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:24.644139image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:24.760115image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:24.875587image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:24.991727image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:25.104127image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:25.222390image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:25.340947image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:25.459108image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:25.576966image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:25.700756image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:25.818851image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:25.937683image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:26.059299image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:26.181710image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:26.403487image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:26.518773image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:26.637029image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:26.753442image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:26.870046image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:26.984833image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:27.105487image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:27.226001image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:27.347131image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:27.484313image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:27.601718image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:27.716462image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:27.841133image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:27.962717image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:28.085843image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:28.215350image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:28.345166image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:28.466653image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:28.589013image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:28.715154image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:28.838187image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:28.963080image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:29.087750image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Correlations

2020-08-10T14:30:32.713318image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-10T14:30:32.961546image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-10T14:30:33.195242image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-10T14:30:33.409782image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-08-10T14:30:33.627148image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-08-10T14:30:29.344418image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-10T14:30:29.685549image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Sample

First rows

Invoice IDBranchCityCustomer typeGenderProduct lineUnit priceQuantityTax 5%TotalDateTimePaymentcogsgross margin percentagegross incomeRating
0750-67-8428AYangonMemberFemaleHealth and beauty74.69726.1415548.97151/5/201913:08Ewallet522.834.76190526.14159.1
1226-31-3081CNaypyitawNormalFemaleElectronic accessories15.2853.820080.22003/8/201910:29Cash76.404.7619053.82009.6
2631-41-3108AYangonNormalMaleHome and lifestyle46.33716.2155340.52553/3/201913:23Credit card324.314.76190516.21557.4
3123-19-1176AYangonMemberMaleHealth and beauty58.22823.2880489.04801/27/201920:33Ewallet465.764.76190523.28808.4
4373-73-7910AYangonNormalMaleSports and travel86.31730.2085634.37852/8/201910:37Ewallet604.174.76190530.20855.3
5699-14-3026CNaypyitawNormalMaleElectronic accessories85.39729.8865627.61653/25/201918:30Ewallet597.734.76190529.88654.1
6355-53-5943AYangonMemberFemaleElectronic accessories68.84620.6520433.69202/25/201914:36Ewallet413.044.76190520.65205.8
7315-22-5665CNaypyitawNormalFemaleHome and lifestyle73.561036.7800772.38002/24/201911:38Ewallet735.604.76190536.78008.0
8665-32-9167AYangonMemberFemaleHealth and beauty36.2623.626076.14601/10/201917:15Credit card72.524.7619053.62607.2
9692-92-5582BMandalayMemberFemaleFood and beverages54.8438.2260172.74602/20/201913:27Credit card164.524.7619058.22605.9

Last rows

Invoice IDBranchCityCustomer typeGenderProduct lineUnit priceQuantityTax 5%TotalDateTimePaymentcogsgross margin percentagegross incomeRating
990886-18-2897AYangonNormalFemaleFood and beverages56.56514.1400296.94003/22/201919:06Credit card282.804.76190514.14004.5
991602-16-6955BMandalayNormalFemaleSports and travel76.601038.3000804.30001/24/201918:10Ewallet766.004.76190538.30006.0
992745-74-0715AYangonNormalMaleElectronic accessories58.0325.8030121.86303/10/201920:46Ewallet116.064.7619055.80308.8
993690-01-6631BMandalayNormalMaleFashion accessories17.49108.7450183.64502/22/201918:35Ewallet174.904.7619058.74506.6
994652-49-6720CNaypyitawMemberFemaleElectronic accessories60.9513.047563.99752/18/201911:40Ewallet60.954.7619053.04755.9
995233-67-5758CNaypyitawNormalMaleHealth and beauty40.3512.017542.36751/29/201913:46Ewallet40.354.7619052.01756.2
996303-96-2227BMandalayNormalFemaleHome and lifestyle97.381048.69001022.49003/2/201917:16Ewallet973.804.76190548.69004.4
997727-02-1313AYangonMemberMaleFood and beverages31.8411.592033.43202/9/201913:22Cash31.844.7619051.59207.7
998347-56-2442AYangonNormalMaleHome and lifestyle65.8213.291069.11102/22/201915:33Cash65.824.7619053.29104.1
999849-09-3807AYangonMemberFemaleFashion accessories88.34730.9190649.29902/18/201913:28Cash618.384.76190530.91906.6